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ABSTRACT. A standard approach to confidence intervals for quantiles requires 
good estimates of the quantile density. The optimal bandwidth for kernel esti¬ 
mation of the quantile density depends on an underlying location-scale family 
only through the quantile optimality ratio (QOR), which is the starting point for 
our results. While the QOR is not distribution-free, it turns out that what is 
optimal for one family often works quite well for families having similar shape. 
This allows one to rely on a single representative QOR if one has a rough idea 
of the distributional shape. Another option that we explore assumes the data 
can be modeled by the highly flexible generalized lambda distribution (GLD), 
already studied by others, and we show that using the QOR for the estimated 
GLD can lead to more than competitive intervals. Confidence intervals for the 
difference between quantiles from independent populations are also considered, 
with an application to heart rate data. 
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1 Introduction 


This work is motivated by the need for good interval estimates for quantiles when one has 
only a rough idea of the underlying shape, and builds on substantial theory already in the 
literature. Here we concentrate on asymptotic intervals of the simple form provided shortly 
in (JTJ) , usually adding and subtracting two standard errors to the quantile estimate. The 
challenge is to obtain a good ‘distribution-free’ estimator of the standard error, so as to obtain 
intervals for quantiles with accurate coverage for all distributions having a similar shape. 
There is also an extensive literature on completely distribution-free confidence intervals for 


quantiles and, as a point of departure, we refer the reader to Serfling (1980, Sec.2.6) for exact 


and asymptotic distribution-free intervals based on order statistics and to DasGupta (2006 
Ch.29) for bootstrap intervals. 

Denote the quantile function associated with a cumulative distribution function F by 
Q[u) = F _1 (u) = inf'{re : F{x) > u}, for 0 < u < 1. Assuming F has a positive derivative 
F'(x ) = f(x) on its domain define q{u) = Q'(u ) = 1 /f(Q{u)) to be the quantile density 


function by Parzen (1979), and earlier dubbed the ‘sparsity index’ by Tukey (1965). A point 


estimate of Q{u) based on a sample X\,, X n of size n from F is the sample quantile 
Q n (u ) = F~ l {u), where F n (x ) is the usual empirical distribution function. Letting t 2 = 


u(l — u)q 2 (u), it can be shown (DasGupta, 2006, Ch.7) that the Studentized quantile is 


asymptotically normal: y/n{Q n (u ) — Q{u)}/t u —> N(0, 1) in distribution. This leads to a 
large sample 100(1 — a)% confidence interval for Q(u ) of the form 


Qn( tt) i m— a/2 


T u 


n 


( 1 ) 


where z a = $ _1 (a:) and $ is the standard normal distribution function. To make this 
confidence interval for Q(u ) distribution-free, one needs to replace r u in ([Tj) by a consistent 
estimator t u , which in turn requires a consistent estimator of the quantile density q{u). Thus 
there are two sources of error in the interval ([!]), that due to estimation of the center and 
that due to estimation of the width. iSheather & Marron ([1990) have compared several 
asymptotically optimal kernel estimators Q(u) of Q(u ) and found 

“... that apart from the extreme quantiles, there is little difference between various 
quantile estimators (including the sample quantile). Given the well-known distribution- 
free inference procedures (e.g., easily constructed confidence intervals) associated with 
the sample quantile, as well as the ease with which it can be calculated, it will often be 


a reasonable choice as a quantile estimator.” - Sheather & Marron (1990). 


Here we estimate the quantiles x u = Q{u) for u G [0.05,0.95] using a linear combination of 
adjacent order statistics x u , the Type 8 version of the quantile command on the package R 
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Core Team (2014) recommended by Hyndman & Fan (1996). Since estimation of the quantile 


density q(u) is more difficult, with MSE[g(w)] = 0(n -4 / 5 ) and MSE[Q(w)] = 0(n _1 ), the 
more important estimate in the confidence interval (JTJ) is not the center, but the width of the 
interval, which requires an estimate of q(u). 


Jones (1992) makes a strong case for estimating q{u) directly by kernel methods, rather 
than by taking the reciprocal of a kernel estimate of f(x u ). It turns out that an asymptotically 
optimal choice of bandwidth for estimating q(u) only depends on what we choose to call the 
quantile optimality ratio QOR(w) = q{u)/q"{u), so this is the object of our attention in 
Section [2j Welsh 


suggests estimating q{u ) and q"{u ) separately and taking the ratio of 


these estimates to find the optimal bandwidth; and Cheng & Sun (2006) do so in computing 


the mean-squared error of estimators of Q{u). While ideally one would consistently estimate 
the QOR at u, it turns out that one only needs a rough estimate of it to obtain good confidence 
intervals for x u . 

Our goal is finding conceptually and computationally simple distribution-free confidence 
intervals for x u , and propose using either one of the optimal bandwidth kernel estimators for a 
representative location-scale family, or to use an adaptive estimator for the generalized Tukey 
A families. For the latter, our work is motivated by recent research from several authors. In 
particular we briefly describe in Section [3] the methods of Su (2009) along with those of Soni 


et al. (2012a). We compare the finite sample performance of their confidence intervals with 


ours in Section [4] and also discuss extension to two-population comparisons. We conclude 
with an example in Section [4] and a discussion and summary in Section [5] 


2 Quantile optimality ratios 


For background material on kernel density estimation see Wand & Jones (1995). 


2.1 Quantile density estimators based on optimal bandwidth 

An appealing and simple to implement quantile density estimator is a kernel density estimator 
which can be expressed as a linear combination of order statistics: 


?(«) = X (*) j kh ( u 


i =1 


i) 


n 


k b [u - 

n 


( 2 ) 


where b is a bandwidth and kb(-) = k(- — b)/b for some kernel function k which is an even 
function on [—1,1] that has variance cr£ = f x 2 k(x)dx and roughness k — f k 2 (y)dy. The 
asymptotic properties of this quantile density estimator have been studied by Falk (1986), 
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Welsh (1988) and Jones (1992), and the last of these gives the asymptotic MSE of q(u) as: 


MSE[g(n)] 


4 bn 


( 3 ) 


By differentiating (J3| with respect to b one Ends that MSE[g(n)] has a minimum when the 
bandwidth b{u ) = A(u) /n 1//5 , where 


A(u) 



( 4 ) 


Therefore an asymptotically optimal choice of bandwidth (|d]) for estimating q{u) only depends 
on the underlying distribution through the QOR(w) = q{u)/q''{u). Note that if F a ^{;x) = 
F(y(x—a)/b ), for unknown a and b > 0, is the location-scale family generated by F — F 0i i, then 
the quantile function for is Q a ,b(u) = a+bQ(u) and the quantile density is q a ,b(u) = bq(u)\ 
thus the quantile density is location-invariant and scale equivariant, and the QOR is both 
location and scale invariant. 


Remarks on derivation of the QOR 

The first two derivatives of the quantile density q are: 

</(«) = ~ = ~f(xu)q 3 (u) 

q'\u) = 3 {f\x u )} 2 q 5 (u)-r(x u )q\u) 


( 5 ) 


In many cases, f'(x) = g(x)f(x). (For example, in the Pearson systems of distributions, see 
Johnson et al. (1994, Chi.), g{x) is a rational function whose numerator is linear in x and 
denominator is quadratic in x.) For such /, we have f"(x) = {g'(x ) + g 2 (x)} f(x) so that 
from (j5j) one Ends q'(u ) = — g(x u )q 2 (u ) and q'\u ) = {2g 2 (x u ) — g'(x u )}q 3 (u)] thus the quantile 
optimality ratio becomes: 


QOR(w) = 


P(Xv 


q(u) = _ 

q"(u) {2 g 2 (x u ) - g'(x u )} 


( 6 ) 


In this case a/QOR(w) is the product of the density quantile function f(x u ) = 1 /q{u) defined 


by Parzen (1979) and a function depending only on g and its derivative composed with the 


quantile function. Perhaps it is worth noting that the score function J{u) = — f'(x u )/f(x u ) = 
—g(x u ) of classical nonparametric statistics is defined in terms of g(x u ). 

It is informative to examine some QOR’s before looking at specific methods for estimation 
of x u in Section [3j because they are much better behaved then we expected. 
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2.2 Examples of the quantile optimality ratio 

The quantile density for the uniform distribution is a positive constant, so its derivatives are 
0, and the QOR(u) = +oo for all 0 < u < 1; thus the optimal bandwidth result does not 
apply in the uniform case. 

Normal. 

Letting z u = Q$(u) = <f )_1 (-u), the quantile density function is q^(u) = l/(p(z u ), with 
first two derivatives q^(u) = z u /p 2 (z u ) = Q$(u) q%(u) and q$(u) = q%(u)(l + 2Q%(u)). 
The graph of the QOR is in the bottom left plot of Figure [TJ This ratio can be written 
\/2 tp(\/2 x u )/{y/n (1 + 2x1)}. As an aside, it also has a simple approximation QOR<$,(u) ~ 
0.4 ip[Q(u — 0.5)) for u real, which perhaps explains why it appears to be a normal density. 

In Figure [I] we also plot the QOR for Cauchy, Laplace and Tukey(A) distribution with 
A = 2.5. For symmetric distributions that are unimodal and positive for all x the graphs 
of q(u) and q"{u) are U-shaped, symmetric about 1/2 and unbounded, while q'(u) is skew- 
symmetric about 1/2 and unbounded. However, the ratio QOR= q(u)/q"{u) is bounded and 
even recognizable: a first glance at Figure [l] suggests that we are plotting density quantiles 
of some well-known distributions, when in actual fact they are the plots of the graphs of the 
quantile optimality ratios. 

Lognormal. 

For x > 0 let F(x) = <3?(In(a;)) be the log-normal distribution function, and recall q$(u) = 

l/<p(z u ) is the quantile density of the normal distribution. It follows that f{x) = F'(x) = 

(p( \n(x))/x. Therefore x u = Q(u) = exp{^ u } for 0 < u < 1 and the quantile density and its 
first two derivatives are 

q(u) = -j2— = Q( u )q#(u) (7) 

q(u) = q{u) q$(u) + Q(u) q'^(u) 

q\u) = q\u)q<s,(u) + 2q(u)q$(u) + Q(u)q'l(u) (8) 

The graph of the quantile optimality ratio QQR(w) is shown in the top left plot of Figure [2j 
For comparison, also appearing in Figure [2] are the QORs for the Pareto type II, the Gamma 
and the Weibull distributions, each with three parameter choices. 
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Figure 1: Plots of the QOR functions for some symmetric distributions. Formulae for the 
QORs can be found in Table [lj 
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Figure 2: Plots of the QOR functions for some asymmetric distributions. 
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Generalized Lambda Distribution. 


We adopt the parameterisation of Freimer et al. (1988) (often referred to as the FKML 
parameterisation), which has quantile function determined by a location parameter Ai, an 
inverse scale parameter A 2 > 0 and two shape parameters A 3 and A 4 : 


Q{u) — Ai + — 


u 


^3 


1 (i-«) 


A4 




A4 


(9) 


The quantile optimality ratio QOR(w) is location and scale free so without loss of generality 
we can take Ai = 0 and A 2 = 1 in (|9]) . The QOR is easily found and shown in Table [I] 
(see GLD-FKML); details are left to the reader. The RS parameterisation (RS Ramberg & 
Schmciser, 1974) is sometimes also used and this is also included in the table. 

This generalized Tukey family of distributions can be used to approximate a large number 


of others, as described in Freimer et al. (1988), Karian & Dudewicz (2000) and Gilchrist 


(2000). Thus an appealing approach is to estimate the parameters A 3 and A 4 and then use 


the QOR for this family to estimate the optimal bandwidth when estimating q(u) and hireling 
a confidence interval for x u = Q(u). 


Other examples. 

More examples of the QOR are provided in Table [l] and derivations of these QORs are in the 
Appendix (Section [ 6 ]). 


3 Methods for interval estimates of quantiles 

In the previous section we showed that optimal bandwidths only depend on the QOR, and 
that the latter can vary greatly in shape, depending on the underlying location-scale family. 
Ideally one would find a good distribution-free estimate of the QOR, and use that to choose an 
approximately optimal bandwidth that gives good finite sample results. However, attaining 
this ideal is not necessary. 

After inspection of the data through histograms or density plots, one is often willing to 
assume something about the underlying distribution, such as unimodality and symmetry on 
infinite support or unimodality and support on a half-open interval [0, 00 ). In the former 
case, choosing the optimal Cauchy, say, bandwidth can lead to relatively small MSEs of 
finite sample estimates of q(u) for data generated from Cauchy or other symmetric unimodal 
distributions. I 11 the latter case, on can assume a Pareto model, estimate the shape parameter, 
and then proceed to estimate the QOR and optimal QOR bandwidth for this model to obtain 
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Family 


QOR(it) 


i -i 


(2irV3){l + 3x 2 u } 

u 2 /2, u < 0.5 

(1 — u) 2 /2, u > 0.5. 


v / 2y?(v / 2 z u ) 


where z u = 1 (u). 


Cauchy 
Laplace 

Normal 
Lognormal 
Logistic 
Bimodal 
Tukey(A) 

Pareto(a) 

Gamma (a) 

Exponential 

Weibull (/3) See (14) in the Appendix. 

ni^3 1 t 1 — lf) X ^ ^ 

gld-fkml(a 3 ,a 4 ) , -—--^4- {. -—-- 

u x 3 3 (A 3 - 2)(A 3 - 1) + (1 - u ) x 4 3 (A 4 - 2)(A 4 - 1) 

GTDRSiA A 1 _ Xsu^- 1 + A 4 (l - u)^ 1 _ 

GLD-RS(A 3 , 4) ^~ 3 (A 3 - 2)(A 3 - 1)A 3 + (1 - u)^- 3 (A 4 - 2)(A 4 - 1)A 4 

Table 1: Quantile optimality ratios (QORs) for various distributions. The normal, lognormal and GLD 
QORs are discussed in Section\2Ji \ with further examples in the Appendix. 


V^(l + 2 z l) j- 
Use formulae (7) and (|8|). 

HI - n )} 2 

2{(1 — u) 3 + u 3 } 

0.25 

U X ~1 + (1 — u) X ~ l 

(A — 1)(A — 2){u A_3 + (1 — u) A_3 } 
q 2 (l — u) 2 
(11 fl)( l + 2a) 


See (13) in the Appendix. 


(1 - u ) 2 /2 


the estimate of q(u). The optimal QOR for the log-normal distribution can lead to good 
coverage of intervals for quantiles for many unimodal skewed distributions on [ 0 , oo). 

Another possibility examined here is restricting attention to a larger parametric family of 
distributions, such as the four-parameter generalized lambda model, estimate its parameters 
and then proceed to estimate q(u) as though this were the true distribution. 



















3.1 Methods based on optimal QOR bandwidths 


These methods follow the Parzen (1979), Falk 


, Welsh (1988) and Jones (1992) method 


of estimating q(u) by q(u) as defined in ([ 2 J with an optimal bandwidth b{u) = A(u)/n 1//5 , 
where A{u) is determined by (JtJ) and the Epanechnikov kernel. Thus A{u) = 1.718 QOR(u) 2 / 5 
where the optimality ratio QOR(u) depends on a fixed location-scale family, see Table [I] 


Method A: Representative models. 

For symmetric unimodal densities with infinite support, a single QOR determined by the 
Cauchy distribution, say, can lead to good interval estimators of x u = Q(u) of the form ([TJ 
for many of them, and lead to reliable confidence intervals, as will be shown in Section |4.1| 
Similarly, if a density is known to have a shape that is skewed to the right on [0, 00 ), the 
QOR that is log-normal for this shape can lead to good interval estimators of quantiles for 


similarly shaped distributions, even Pareto and exponential models, see Section 4.2 


Method B: Simple adaptive parametric models. 

In the case of symmetric unimodal densities one could assume the 3-parameter symmetric 
Tukey(/i, a, A) family, and estimate A with the data before choosing the QOR function that is 
optimal for this model to obtain the bandwidth, estimate q x (u) and then find the associated 
interval for x u . See Ch. 7 of Gilchrist] (2000) for a discussion and references. 

In the case of skewed-right densities on [0, 00 ), if one is willing to assume for simplicity a 
Pareto type II model with known scale 1, then an adaptive procedure which first estimates the 
shape parameter a by the maximum likelihood estimate and then uses the optimal bandwidth 
for the estimated model is an option. The QOR a (w) = a 2 (l — w) 2 /{(l + a)(l + 2a)} and the 


MLE a = n/^Tln(l + -W). The performance of this estimator is studied in Section 4.2 


Extensions to the more practical case of unknown scale and shape parameters is possible, see 


Ch.20 of Johnson et al. (1994) and the R package maxlik. 


3.2 Methods based on the generalized lambda distribution 

It is tempting to restrict attention to the 4-parameter generalized lambda distribution (GLD) 
because it approximates a wide range of symmetric and asymmetric distributions; this model 
requires implicitly defined estimates of the parameters, now possible with most software 
packages. This model has two drawbacks: an explicit expression for the estimated density 
is not available except in special cases, and if the underlying distribution is not a member 
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of the GLD family the methods will not be consistent for estimating the quantiles. Methods 


C,D and E will be compared in Section 4.3 


Method C 


Su (2009) suggests taking / to be the density of the Generalized Lambda Distribution 


(GLD) which depends on the location parameter Ai, inverse scale parameter A 2 and the 
shape parameters A 3 and A 4 . Given estimates of Ai,...,A 4 which give rise to estimated 
GLD quantile Qc( u )i estimated GLD density fc and estimated standard error fc/\Jn = 
y/u{ 1 — u) f c (Qc(u))/V^ °f the GLD quantile density, one can construct a confidence in¬ 
terval Q for x u . This approach is called the Normal-GLD method by Su (2009) and we will 
call it Method C. 


Method D 


Su (2009) also introduced a method for estimating x u = Q(u ) based on the estimated GLD 


cumulative distribution function F. Let m = floor(mt) and Fjg(-;m + 1 ,n — m) be the 
cumulative distribution function for the Beta B(m+l,n — m) distribution. One then defines a 
l—a confidence interval [L, U] by taking the solutions to a/2 = Fjs{F{L)\m+ 1, n—m ) and 1 — 
a/2 = Fq{F(U ); m + 1, n — m). Such root-finding is routine on most statistical packages. For 


justification of this approach, see Su (2009), who calls this the Analytical-GLD approach; here 


we call it Method D. Su (2009) provided simulated evidence of typically improved performance 


of Methods C and D when compared to popular bootstrap counterparts. 


Method E 

In the spirit of Method B we propose an adaptive GLD method that first estimates the 
4 parameters of the GLD distribution and employs the optimal QOR bandwidth for this 
estimated model in finding the confidence interval. A caveat is that any adaptive approach 
to bandwidth selection such as Method E may not minimize MSE as well as expected because 
the bandwidth is now random and the derivation of the optimal bandwidth assumes it only 
depends of n, u , the kernel and the underlying distribution. It could turn out that using an 
optimal bandwidth for a fixed distribution with shape near that of the underlying distribution 
leads to better finite sample results than one which estimates unknown parameters first. 
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3.3 Nonparametric methods 


Methods F,G and H below are due to Soni et al. (2012a) and are examined briefly in Sec¬ 
tion 14.41 


Method F 

The ‘reciprocal’ approach to estimating q(u) = 1 /f(Q(u)) is to first estimate Q(u) by 


Q(u) = 


pi/n 

£*«> /. 

J (i— 1). 


k b (u - y) dy 


( 10 ) 


(*-l)/n 


and f{x) by f(x) = (^T hb{Xj — x))/n, where b = b n (u) is the bandwidth, 
denotes this estimator q(u) = 1 /f(Q(u)) and 


Soni et al. 


Jones 


(1992) 


(2012b) names it q 3 n ' (u). Here we call 


it qp(u). Method F for finding a confidence interval for x u = Q(u ) consists of substituting 
(Jf(u) for q{u) in (JTj). 


Method G 


Rather than estimate f(Q(u)) and taking its reciprocal, Jones (1992) proposed directly esti¬ 
mating q(u) by (J 2 J) which he called q(u). Soni et al. (2012b) denoted it ^ 2 (u) and hereafter 
we call it qp(u). Method G consists of substituting qc(u) for q(u) in (JT|) . The main conclusion 


of the asymptotic MSE results of 

Jones 

(1992 

!) is that Method G is preferable to Method F, 

except perhaps for quantiles in the center. 

3 oni et al. 

(2012b 

) assert on the basis of their 


simulations that the reverse is true. 


Method H 


Soni et al. (2012b) proposed Method H, which consists of substituting (Jh(u) for q{u) in ([Tj), 
where 

^ 1 kb (Si — u) 

Qh{u) = -^ 


n 


1=1 


( 11 ) 

/(*«) 

and Si is the proportion of observations less than or equal to X^, kb(-) = k(-/b)/b is a kernel 
function and / is the usual kernel density estimator of / defined in Method A. 


Soni et al. (2012b) show via simulations that for certain distributions and selected choices 


of u ranging from 0.2 to 0.9 that q H was superior to qp and qc- Interestingly, they did not 
try to optimize the bandwidth b = b n (u ) but chose it arbitrarily to be a constant 0.15, 0.19 
and 0.25 which led to similar results, so they only reported those for 0.19. 


11 
































Cauchy 


Normal 




Laplace 



Logistic 



Figure 3: n=400 Estimated coverage probabilities for x u = Q(u) for u = 0.05 : 0.95/0.05 
based on 10,000 replicates of four optimal 95% confidence intervals. The intervals are based 
on optimal QOR’s for the Cauchy (solid lines), the normal (dashed lines), the Laplace (dotted 
lines) and the logistic (dot-dashed) lines. The four plots are for data generated assuming these 
respective models. 


4 Simulation Studies 


All studies that follow were carried out with the R statistical software package (R Core 


Team 2014). Specifically, to obtain the numerical maximum likelihood estimates for the GLD 


distribution, we used the gld package (King et al, 2014). The package also includes many 


other estimators and the reader is referred to Corlu & Meterclliyoz (2015) for a discussion of 


the performance of many available estimators. Additionally, other packages are also available 


that may be used to estimate the GLD parameters (for e.g., the GLDEX package Su et al. 


2007). 


In this section we present several simulation studies that emphasize the usefulness of the 
optimal QOR approach in many situations and provide additional evidence that Method C 
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of Su (2009) can provide very good results, in particular when n is small. While relatively 
small bias and standard errors of q n {u) are desirable, our primary goal is good coverage 
probability of the associated confidence intervals [L, U] defined in ([I| and secondary goal of 
small standardized widths n 1 / 2 (f/ — L); thus these quantities will be estimated by simulations 
and recorded. 


4.1 Study 1: Performance of optimal bandwidth estimators for 
symmetric unimodal F with infinite support 

Here we compare four optimal estimators for the normal, logistic, Laplace and Cauchy quan¬ 
tile functions q(u), and the coverage and widths of the associated large sample 95% confidence 
intervals (|TJ) for x u = Q(u), for u 6 [0.05,0.95]. All four intervals will be compared for data 
generated from all four distributions, to see if any of the intervals gives acceptable results 
for all of them. Preliminary examination of biases of these optimal q{u) s reveals that all are 
biased upwards over this range of study u G [0.05,0.95], with the optimal for Cauchy being 
the least biased. The standard errors of the q[u) s are very similar, and so are the average 
widths of the associated intervals. For n = 100, the average width of any of these intervals is 
approximately one for u = 0.5, and grows to 4 at u = 0.15 (or u = 0.85) and is 10 for u = 0.05 
(or u = 0.95). In any case, for each u the intervals decrease in size at the rate of n ~ 1/<2 , so 
these facts should be kept in mind in deciding whether a given sample size is adequate for 
estimating a given quantile x u to a given accuracy. 

In Figure [3] a comparison of the empirical coverages of these intervals is shown for sample 
size n = 400 from each of the Cauchy, normal, Laplace and logistic distributions. The 
confidence interval based on the optimal Cauchy estimator for the quantile density function 
performs best, with coverage probability near the nominal value for all u G [0.1, 0.9] for all 
four generated data sources. The other three intervals have conservative coverage over this 
range; they possibly could be improved by making finite sample adjustments. If n is reduced 
to 100, simulations (not shown) demonstrate that only the optimal for Cauchy intervals have 
coverage near 95% for these 4 settings, and then only over the range [0.2,0.8]; the other 
intervals are far too conservative to be of practical value. For n = 200, the range of useful 
coverage of the optimal Cauchy intervals is [0.15,0.85]; and for n = 800 this range extends 
to [0.05,0.95], 

In addition, the same optimal QOR for Cauchy interval gives close to 95% coverage for 
u G [0.1, 0.9] when sample sizes are n = 400 from the symmetric Tukey distribution and A 
ranging from —1 to 5, although confidence coefficients fall slightly below the nominal 95% 
for the uniform cases A = 1 and A = 2. A thorough analysis, including comparison with an 
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optimal for the Tukey A family, with A estimated from the data, would be of interest but is 
beyond the scope of this work. 

In summary, the optimal for Cauchy QOR method is recommended for symmetric uni- 
modal distributions with infinite support, and this is Method A under such an assumption. 

4.2 Study 2: Performance of optimal bandwidth estimators for 
skewed F with support on [0, +oc) 

In the second study we generated data from skewed models often postulated for positive 
income data: the lognormal, exponential, and Type II Pareto models with shape parameters 
a = 1 and a = 2. For these four models we compare the confidence intervals associated with 
the optimal QOR estimators of q(u) for the first three of these models and with those of 
an adaptive method that first estimates the Pareto shape parameter a by the MLE a = 
n/ JT ln(l + xf) and then uses the optimal bandwidth QOR a(u). 

Important boundary correction: Preliminary simulations showed that none of the above 
intervals had satisfactory coverage for smallish u < 0.2 because the “optimal” bandwidths 
were extending beyond the lower boundary of [0, +oo). This problem was successfully resolved 
by taking the bandwidth to be b = min{u,b}. Hereafter we make this boundary correction to 
optimal QOR bandwidths for distributions with support [0, +oo). 

One can see from Figure [4] that when n = 400 no interval is uniformly better in terms of 
coverage than the others, and all have similar coverage for small u because of the boundary 
correction. In general, the optimal QOR for exponential intervals have too conservative 
coverage, while the other three intervals perform satisfactorily for u G [0.05,0.95]. 

In practice an adaptive Pareto model will also require estimation of scale, so we also 
tried using maximum likelihood estimates for unknown scale and shape, but found this ap¬ 
proach unsatisfactory because the likelihood was often flat, leading to unreliable estimates 
of shape and much worse performance of the associated optimal for Pareto QOR intervals. 
To summarize, as a result of Study 2, in the context of data with known support [0, +oo) we 
recommend using the optimal QOR for lognormal or the Pareto(a = 1) method for quantile 
interval estimation. We take Method A in this context to be the optimal QOR for lognormal. 
We do not discuss Method B further in this paper. 
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Figure 4: n=400 Estimated coverage probabilities based on 10,000 replicates of four 95% 
confidence intervals for x u = Q(u), based respectively on optimal QOR’s for the Lognormal 
(solid lines), the exponential (dashed lines), the Pareto(a = 1) (dotted lines) and the adaptive 
Pareto, which uses the MLE a to determine the QOR, (dot-dashed) lines. The four plots are 
for data generated from the Lognormal, exponential, Pareto(a = 1) and Pareto(a = 2). 


4.3 Study 3: Comparison of various intervals estimators based on 
QOR for generalized lambda models 

In this Section we compare the optimal QOR for the Cauchy and/or lognormal (Method A) 


with Methods C,D of Su (2009) and our Method E, adaptive QOR-GLD for u G [0.05,0.95]. 
We consider estimation for the standard Cauchy(0,l), the heavy-tailed G-H(0.2, 0.2) (see 


Tukey, 1977; Hoaglin, 1985 Hendrick et al. 2008), (also considered by Su (2009) for u = 
0.05,0.25,0.50,0.75 and 0.95), the standard lognormal LN(0,1) and the exponential EXP(l) 
distributions for sample sizes n = 100 and 500. We make our comparisons with empirical 
coverage probabilities from 10,000 simulated runs for each choice of u. Widths of these 
intervals are very similar and are therefore not reported. 
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For estimation of the GLD parameters, we use maximum likelihood estimates for the 
FMKL parameterization (e.g. Su 2007). While many other estimators of the GLD parameters 
are available, after making several comparisons Corlu & Meterelliyoz (2015) find that the 
MLE estimators are a good choice since they often result in decreased estimator variability. 
However, they note that there are ongoing improvements to GLD parameter estimation so 
that the results that follow could later be improved with the introduction of better estimators. 


4.3.1 Single population quantiles 





Figure 5: Simulation results for coverage probability for the Cauchy, G-H(0.2, 0.2), LN(0,1) 
and EXP(l) distributions, plotted as a function of u, for samples sizes n = 100 (left) and 500 
(right). 
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The simulated coverage probabilities for the various interval estimators of the quantiles are 
depicted in Figure [5] for u = 0.05 : 0.95/0.01. For the QOR approach to choosing the optimal 
bandwidth, we include the QOR for the lognormal (A-LN; thick grey dot-dash), the Cauchy 
(A-Ca; dot grey) and the adaptive GLD (E; thick grey line). We also consider Methods C (C; 
black dot-dash) and (D; black line) of Su (2009). For the Cauchy and lognormal distributions 
we expect the Cauchy QOR and lognormal QOR, respectively, to have suitable bandwidths 
that achieve close to nominal coverage and this is indeed the case for both sample sizes 
n = 100 and 500 for all u. However, it can be seen from the coverage plots for the G-H 
distribution that a poorly chosen QOR can result in inadequate coverage. The lognormal 
QOR when used for the symmetric Cauchy and the approximately symmetric G-H(0.2,0.2) 
distribution results in very poor coverage for small u and n = 100 and coverage that is too 
conservative when n = 500 and for u < 0.4. Surprisingly, the optimal Cauchy QOR typically 
performs well for both the symmetric and non-symmetric distributions. 

While we expect the Cauchy QOR to work well for symmetric distributions, its simplicity 
and performance for these non-symmetric distributions means that it may be an attractive 
option in general. However, rather than explore the Cauchy QOR further, it should be 
pointed out Method E, the adaptive GLD-QOR (thick grey line), performs remarkably well 
for all distributions considered and for both n = 100 and n = 500. Given that the GLD can 
approximate many distributions, these results suggest that in comparison to the other QOR 
approaches, it provides an excellent means to obtain a good bandwidth for the problem at 
hand. It should also be noted that GLD only needs to approximate the true underlying dis¬ 
tribution reasonably well in order to End a good bandwidth and consequently good coverage. 
This is not necessarily the case other methods based on the GLD and we will comment on 
this next. 

While Methods C and D are often slightly conservative for n = 100, the performance of 
these methods may suffer for larger n. As noted by Su (2009), their performance depends 
on the ability of the GLD to approximate the true underlying distribution. Consequently, 
because bias in the density estimation does not diminish with increasing sample size, it can 
lead to poor coverage, especially when n is large. 

To assess the ability of the GLD distribution to adequately approximate other distribu¬ 
tions, we compare the GLD quantiles with the quantiles from the distributions that it is 
seeking to approximate. In Figure [6] we plot the differences between simulated GLD quan¬ 
tiles versus the actual quantiles for the LN(0,1) and EXP(l) distributions for varying u and 
three choices of sample sizes. The differences are reported as a percentage difference with a 
negative value indicating an under-approximation for the GLD quantile. The simulated GLD 
quantile was obtained over 1000 simulation runs. For the LN(0,1) distribution, it can be seen 
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A: LN(0,1) 


B: EXP(1) 




u u 

Figure 6: Differences (as a percentage) between simulated GLD quantiles and the actual 
quantiles for the true distribution. The LN(0,1) and EXP(l) distributions are considered for 
three sample sizes, 100, 200 and 500; in this study there are 1000 replications. 

that the bias in the GLD does not necessarily decrease as n increases. For the EXP(l) dis¬ 
tribution, while the bias does decrease for increasing n, it is still evident for n = 500. Given 
that Methods C and D are assuming that the GLD is a consistent estimator for the true 
underlying density, the persistent bias for some distributions and with large n can explain 
why these methods can return poor coverage as n increases. 

Nevertheless, our investigations suggest that Methods C and D are comparatively very 
good interval estimators for smaller n. In Figure [7] we depict the simulated coverages for 
several methods when the sample size is just n = 50. While the QOR approaches perform 
reasonably well provided that u is neither small or large, the coverage can otherwise be very 
poor. This is understandable for small n where direct estimation of the quantile density is 
being attempted in regions with very few observations. On the other hand, Methods C and 
D are comparatively quite good across a wide range of u. For these methods, replacing the 
unknown density with the estimated GLD density works well since the quantile density is 
not heavily effected by sparsity of points in some regions of u. That is, all 50 observations 
are used in the estimation of the four GLD parameters and provided that these estimates are 
reasonably accurate, then the methods should perform well with small n. 

4.3.2 Extension to comparing quantiles from two-populations 

Generalizing ([!]) it is straightforward to construct large sample interval estimators for linear 
combinations of quantiles from independent populations. For example, let y p denote the pth 
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Figure 7: Simulation results for coverage probability for the Cauchy(0,1), G-H(0.2,0.2), 
LN(0,1) and EXP(l) distributions for varying u and samples size n = 50. 


quantile for a new population where 0 < p < 1 and denotes its estimator y p based on a 
random sample of size m. Since x u and y p are independent, then a large sample confidence 
interval for x u — y p is 


{x u 


Vp) i Z\~ a /2 



+ 


V 

m 


( 12 ) 


where mNax[y p ] = v p . 

In Figure [8] we provide simulated coverage probabilities for for Methods A (with lognormal 
and Cauchy QOR) and the GLD Methods C and E. With the exception of when using the 
lognormal QOR for symmetric (or close symmetric distributions as is the case for the G- 
H(0.2, 0.2) distribution) the methods typically return good coverage for most choices of u. 
In particular, Method E which used the GLD QOR provides the most stable coverage with 
coverage probabilities rarely below nominal and usually between 0.95 and 0.97 for both choices 
of n. 


4.4 Study 4: Methods F,G and H 


Methods F,G and H are due to Soni et al. (2012b) and their simulation studies for sample 
sizes 200 from the exponential (their Table 3.1) and GLD(1, —1, —1/8, —1/8) (their Table 3.2), 
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Figure 8: Simulation results for coverage probability for the differences in quantiles from 
several distributions distributions for varying u and samples sizes n = 100 (left) and n = 500 
(right). 

comparing the MSE of the following estimates F,G, and H of q(u) for both Epanechnikov 
and triangular kernels, with 3 choices of constant in u bandwidth h n = 0.15,0.19 and 0.25. 
Neither kernel gave uniformly better results and the bandwidth results are not that dependent 
on choice of constant. Their Table 3.1 shows that their Method H has smaller MSE than 
Methods F and G for u = 0.2, 0.4, 0.6, 0.8 and 0.9. We tried to replicate their results but 
obtained only partial agreement that their H is preferable to F in terms of MSE. We also 
included Method E, the adaptive QOR for the GLD distribution for comparison, and found 
it competitive for this sample size in the context of exponential data. More importantly, for 
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larger n = 500 say, the bias and MSE of their Method F estimator is larger than for n = 200 
when u = 0.8 or u = 0.9. This suggests that with constant bandwidth the bias of all their 
estimators will grow without bound, leading to small coverage probabilities of the associated 
confidence intervals. One would expect that for all of their methods F,G and H a bandwidth 
of order n~ 1//5 is required for minimizing the MSE, but there is no discussion of bandwidth 
dependence on n in their paper, although it is denoted h n . 


4.5 Study 5: Example of Heart-rate data 


Su (2009) compared heart rate observations for males with those for females where there are 


65 observations in each group. The data set does not explicitly state which data belongs to 
which group so we will follow Su’s lead and refer to simply gender 1 and gender 2. Since 
the maximum likelihood estimates for the RS GLD parameterization resulted in the best QQ 
plots, it was this parameterization that was used and that we will subsequently use here. Su 
reported that the modes for the fitted distributions are located at the 0.48 and 0.64 quantiles 


for gender 1 and gender 2. Based on the parameter estimates reported by Su (2009), we were 


able to replicate the reported 95% confidence intervals for the population quantiles given as 
[71.02, 74.81] and [74.89, 80.33] respectively. As noted by Su, the lack of overlap for these 
intervals indicates a significant difference between the quantiles. 

A potential problem with the reliance on separate conEdence intervals to determine a 
difference between two independent groups is in what to do conclude when they do overlap. 
Overlapping confidence intervals can still indicate a significant difference and the intervals re¬ 
ported above were, in fact, very close to overlapping. We will instead use the QOR approach 


based on the GLD to hnd an interval for the diEerence in quantiles (see Section 4.3.2) as well 
as separate intervals for each group. The QOR for the RS GLD can be found in Table [l] For 
gender 1 and gender 2 our separate 95% intervals are [70.94, 75.06] and [75.30, 81.00] respec¬ 
tively which are similar to those reported above. For the diEerence between the quantiles 
between gender 2 and gender 1, our 95% confidence interval is [1.63, 8.67] which suggests a 
positive diEerence. 

To emphasize our point about the advantages of being availed a single interval for the 
diEerence in quantiles, we increase the confidence level to 99% and report the intervals in 
Table [2j The intervals are, as expected, wider and the gender 1 and gender 2 intervals now 
overlap for both methods. However, the 99% confidence interval for the diEerence in quantiles 
using the QOR approach is free of zero and still indicative of a diEerence between the groups. 
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Analytical 


QOR 


gender 1 

gender 2 

gender 1 

gender 2 

gender 2 — gender 1 

(70.44, 75.42) 

(73.95, 81.08) 

(70.29, 75.71) 

(74.4, 81.89) 

(0.52, 9.77) 


Table 2: 99% confidence intervals for the quantiles for gender 1 and gender 2 (for the the 
analytical and QOR methods) and for the difference between the gender 2 and gender 1 
quantiles using the QOR method. 


5 Summary and discussion 


A kernel estimator of the quantile density q(u) that has been the subject of investigation 
by several authors is known to have optimal, (in the sense of minimizing the asymptotic 
mean squared error at u ), bandwidth that depends on the underlying location-scale family 
only through the quantile optimality ratio QOR(w) = q{u)/q"{u). By examining numerous 
examples of this ratio, we found it to be relatively well behaved (compared to q and its 
derivatives) with graph similar in shape to the square of the density quantile f(x u ). The 
consequence for estimation of q(u ) needed in construction of the simple confidence intervals 
(JT|) is that q"(u ) need not be estimated. Rather, a representative QOR that is optimal for 
one family turns out to be more than adequate for many similarly shaped families. We called 
the representative bandwidth approach Method A. 

In particular for symmetric unimodal distributions with infinite support, we found that 
using the optimal for Cauchy QOR bandwidth led to relatively good coverage for all u G 
[0.05, 0.95] and n > 400. For smaller n , the good coverage range of u diminishes, but is 
still competitive with most intervals. For unimodal skewed-right distributions with support 
on [0, oo) a similar result holds: the optimal for lognormal QOR (or Pareto(a = 1) model) 
provides a bandwidth leading to very good coverage for u G [0.05, 0.95] for a variety of 
commonly assumed distributions. In this case a very simple boundary correction is required 
for the bandwidth. Method A quantile density estimates may also improve the performance 


of ratios of linear combinations of quantiles such as the standardized median Staudte (2013) 


or skewness measures Staudte (2014). In those papers a truncated kernel density estimator 


for f(x u ) was the basis for estimating its reciprocal q(u), but a small simulation study shows 
that the direct estimators of q{u) proposed herein are superior, especially for small or large 
u. 

Method B was less successful. It tried assuming much less, such as a symmetric Tukey A 
model, first estimating the parameter and then using optimal QOR^(w). We also tried one 
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and two-parameter Pareto models, but finding the optimal QOR for the estimated model 
appears to be a more complicated procedure with less success. 

Some researchers are willing to assume the four-parameter generalized lambda family of 
distributions. We found that by estimating the parameters first and then using the optimal 


QOR for this GLD led to relatively good coverage, compared to Methods C and D of Su 


(2009), of confidence intervals for n > 100. We called this Method E. We showed that Methods 
C and D are unlikely to be useful for larger n if the underlying distribution was not in the 
GLD, because the quantiles of the estimated GLD do not converge to the true quantiles. 
Nevertheless, for small n we recommend Method C, which has remarkably good coverage 
over a large range of u. Further research into parameter estimation for the GLD could lead 
to improvements in Method E. 

The distribution-free Methods F,G and H Soni et al. (2012b) which use a constant band¬ 
width were found uncompetitive. The only constant bandwidth QOR that we could find 
arose from a bimodal distribution with finite support, see the Appendix in Section [6j 

Finally, we showed that it is easy to extend these results to two-sample problems where 


it is desired to compare quantiles from independent samples. For the heart-rate data of Su 


(2009), we found a single interval for the difference in quantiles which is easier to carry out 
tests for a significant difference in quantiles, rather than relying on the more complicated test 
based on separate confidence intervals. 

In summary, examination of the QOR for many parametric families shows that the optimal 
bandwidth based on it need not be very accurate to obtain a good estimate of the quantile 
density required for simple distribution-free confidence intervals. At the same time the shape 
of the QOR cannot be ignored if one desires good coverage of confidence intervals for quantiles, 
for moderate and large sample sizes. 

With regard to further research, an important class of models for which this methodology 
may prove useful is the mixtures of normal distributions, for which one can easily obtain 
parameter estimates using existing software packages. One would need an expression for, 
and an estimate of, the QOR in this case, in order to obtain interval estimates for quantiles. 
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6 Appendix. Additional examples of the QOR 


The distributions below are described in Johnson et al. (1994) and Johnson et al. (1995). 


Cauchy. 

For F{x) = 0.5 + arctan(x)/7r the quantile function is x u = Q{u) = tan(7r(w — 0.5)). Hence 
q{u) = 7rsec 2 {7r(n — 0.5)} 

q\u) = 27T 2 [tan{7r(u — 0.5)} + tan 3 {7r(u — 0.5)}] 
q"{u) = 27t 3 sec 2 {7r(w — 0.5)} [l + 3tan 2 {7r(n — 0.5)}] 

Elementary calculations show that this ratio equals to l/( 27 T\/ 3 ) times the Cauchy density 
having median 0 and scale parameter l/\/3, evaluated at the quantile function. A formula 
for the QOR is given in Table [T} 


Laplace. 

If f(x) = e — 1^1/2, the quantile function is Q(u) = ln(2 u) for 0 < u < 1/2 and Q(u) = 
— ln{2(l — u)} for 1/2 < u < 1, so the quantile optimality ratio is QOR{u) = u 2 /2 for 
0 < u < 1/2 and (1 — u ) 2 /2 for 1/2 < u < 1; its graph is shown in the upper right plot of 
Figure [I] Elementary calculations show that QOR(u ) = fi/ 2 (x u )/ 8 , a scale multiple of the 
Laplace distribution with median 0 and scale 1/2, evaluated at the quantile function. 


Logistic. 

The density function f(x) = e~ x {l + e ~ x } 2 for all x and the quantile function Q(u) = 
ln('u) — ln(l — u), for 0 < u < 1. It follows after taking its derivatives that the QOR is given 
by 2{ (1 — u ) 3 — u 3 }/{u(l — m)} 2 . 


Bimodal with constant ratio. 


Since at least one publication Soni et al. (2012a) used a constant ratio in the definition of 
the optimal bandwidth Q, we looked for a density / which leads to a constant QOR. One 
possibility is f{x) = (2(e — |x|)} _1 for all |x| < e — 1. It has constant QOR(u) = 1/4. However 
when the QOR in the bandwidth is constant the ensuing estimator of q(u) is poor for most 
u compared to many others, so we did not consider it further. 
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Tiikey A. 


Recall that the Tukey(A) distributions have quantile density function given by q(u) = u x ~ l + 
(1 — u) A_1 for all A. There is no closed form for the density / itself except when A = 0 and 
the distribution F is logistic, or when A = 1 or 2 and the distribution is uniform. When 
A = 2.5, the quantile optimality ratio q(u)/q"(u ) is approximately constant over the range 
0.1 < u < 0.9. 


Pareto. 

For x > 0 let F(x] a) = 1 — (1 + x)~ a , where a > 0 is a shape parameter; such distributions 
are called Lomax or Pareto Type II. The quantile function x u = Q(u) = (1 — u ) _1 / a — 1, for 
0 < u < 1; and, the quantile optimality ratio QOR(w) = a 2 (l — w) 2 /{(l + a)(l + 2a)}. 


Gamma. 


Recall the gamma distribution has density f(x; a) = x a ~ l e~ x /T(a) for x > 0, where a > 0 is 
the shape parameter and T(a) is the gamma function. Using ([5]) and the fact that f'(x\ a) = 
g(x)f(x ; a), where g(x) — (a — l)/x — 1, one finds the quantile optimality ratio: 


QORJu) 


_ x 2 u f 2 {xu,a) _ 

(2 a — l)(a — 1) — 4(a — l)x u + 2x 2 


(13) 


Note that this ratio is negative for some u if 0.5 <«<!.. 


Weibull. 


For (3 > 0 the Weibull distribution function is defined for each x by F(x; j3) — 1 — e~ x . 
Clearly f(x; (3) = g(x)f(x ; (3), where g{x) = {(3 — l)/x — (3 /x^ -1 . Again applying ([5j yields 
the result: 


QOR^u) 


_ xlf 2 (x u ;P) _ 

(2(3 - l)((3 - 1) - 3 - l)xi + 2^xf ' 


(14) 


The special case of the Weibull when (3 — 1 is of interest because its ratio QOR(u) = 
(1 — u) 2 / 2, which is the same form as the ratio for the Pareto II family. However, for any 
a > 0 the coefficient of (1 — u ) 2 for the Pareto II (a) distribution is less than 1/2. 
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